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Acronyms 


Block random access memory (BRAM) 
Built-in-self-test (BIST) 

Combinatorial logic (CL) 

Commercial off the shelf (COTS) 

Complementary metal-oxide 
semiconductor (CMOS) 

Device under test (DUT) 

Digital Signal Processing Block (DSP) 

Distributed triple modular redundancy 
(DTMR) 

Edge-triggered flip-flops (DFFs) 

Field programmable gate array (FPGA) 

Global triple modular redundancy 
(GTMR) 

Joint test action group (JTAG) 

Input - output (I/O) 

Internal configuration access port (ICAP) 
Linear energy transfer (LET) 

Local triple modular redundancy (LTMR) 
Look up table (LUT) 

Microprocessor (MP) 


Operational frequency (fs) 

Processor (PC) 

Phase locked loop (PLL) 

Power on reset (POR) 

Probability of flip-flop upset ( Pdffseu ) 
Probability of logic masking (Pi ogic ) 
Probability of transient generation ( P gen ) 
Probability of transient propagation ( P prop ) 

Radiation Effects and Analysis Group 
(REAG) 

Single event functional interrupt (SEFI) 
Single event latch-up (SEL) 

Single event transient (SET) 

Single event upset (SEU) 

Single event upset cross-section (cr SEU ) 
Static random access memory (SRAM) 
System on a chip (SOC) 

Transient width (T width ) 

Triple modular redundancy (TMR) 
Universal Serial Bus (USB) 

Windowed Shift Register (WSR) 
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Overview 


• Review of FPGA Roadmap chart. 

• Work performed by NASA/GSFC: 

- Security and trust, 

- Xilinx Virtex-5 (commercial) heavy ion 
testing, 

- Xilinx Kintex-7 heavy ion testing, 

- Study of TMR mitigation techniques, 

• Plans for FY15 and out: 

- Microsemi, Xilinx, Altera, Synopsis. 
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FPGA Security and Trust 


• Goal: Support the U.S. government 
concerns over security and trust in FPGAs 

• Conference participation: 

- Xilinx Security Working Group (XSWG) 2014 in 
Boulder/Longmont, CO. 

- Government Microcircuit Applications and 
Critical Technology Conference (GOMACTech) 
2015 in St. Louis, MO. 

- Hardened Electronics and Radiation 
Technology (HEART) 2015 in Chantilly, VA. 

• Collaboration with Aerospace Corporation 
and other agencies. 
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Review of FPGA Roadmap Chart: 
Field Programmable Gate Arrays (FPGAs) 



Trusted FPGA 

- DoD Development 


TBD - (track status) 


Altera 

- Stratix 5 (28nm TSMC 
process commercial) 

- Max 10 (55nm NOR based 
commercial- small mission 

candidate) 

- Stratix 10 (14nm Intel 
process commercial) 

Microsemi 

RTG4 (65nm RH) 


Radiation Testing 


Radiation Testing Reliability Testing 


Radiation and Reliability Testing 


Radiation Testing 


Package Reliability Testing 


Xilinx 

- 7 series (28nm commercial) 

- Ultrascale (20nm commercial 
- planar) 

- Ultrascale+ (16nm 

commercial - vertical) Radiation Testing 

- Virtex 5QV (65nm RH) 


Radiation Testing 


Radiation Testing 


Radiation and Reliability Testing 


Package Reliability Testing 


FY14 


FY15 


FY16 


FY17 


FY=Fiscal Year 
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Xilinx Virtex 5 Heavy Ion SEU Testing 

65nm bulk CMOS 
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Xilinx Virtex-5 FPGA Investigation 



• This was an independent study to determine the 
single event destructive and transient susceptibility 
of the Commercial Xilinx Virtex-5 device with special 
interest regarding its embedded PowerPC 440. 

• The FPGA-DUT was configured to have various test 
structures that were geared to measure specific types 
of Single Event Effect (SEE) susceptibilities of the 
device. 

* The DUT was monitored for single event transient 
(SET), single event upset (SEU), and single event 
latch-up (SEL) induced faults while exposing the 
devices to a heavy ion beam. 

* Test strategies are based on the NEPP FPGA SEU 
Test guidelines manual : 

https ://nepp. nasa.gov/fi les/23779/f pga_rad iation_test_g u ide 
lines_2012.pdf 
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Test Facility Conditions 


• Facility: Texas A&M University Cyclotron Single 
Event Effects Test Facility, 25 MeV/amu tune). 

• Flux: 50-to-10000 particles/cm 2 s 

• Fluence: All tests were be run to 1 x 10 7 particles/cm 2 
or until destructive or functional events occurred. 

• Test temperature: Room temperature 
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Test Run Conditions 


• Total of 437 test runs were performed. 

• Test structures utilized: 

- Configuration, Windowed shift registers (WSRs), 
Counters, PLL, BRAM+EDAC, and PowerPC 440. 

• Flux rates were able to be kept low when required - under 
1 00(particles/s). 

• Flux selection: calculated from configuration SEU rates and 
speed of scrubber. 

- Hence, when starting tests at a particular LET, static 
configuration tests were run first in order to calculate 
configuration upset rates at a given flux. 

- SEU rates should be lower than scrub rate. 

• Note: Some tests were run with the scrubber on versus the 
scrubber off in order to determine if scrubbing would affect 
the system SEU rate (non-mitigated system). 
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Block Diagram Test Environment 

for PowerPC 440 



Arithmetic unit (APU); floating point unit (FPU); 
Processor local bus (PLB); 


Virtex5-DUT-Boa 





16 

Mbyte 

SRAM 
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Results Summary 



• A new method for FPGA processor testing has been 
developed by NEPP (presented at ETW 2014). 

• The following are a few of the techniques that were 
implemented for this test campaign. 

- Fault visibility is increased by extracting internal processor 
signals and feeding them to the LCDT. 

- The LCDT places watchdog signals on these new observable 
points. 

- Watchdog failures are noted, time-stamped, and stored to the 
host PC. 

- The PC signals are also sent to a logic analyzer for real-time 
observation during irradiation. 

• Take away: new method has proven to increase visibility of 
faults: 

- SEU cross sections become more accurate. 

- Component failure analysis is enhanced. 
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Xilinx Kintex-7 SEL Testing 
high-k metal gate (HKMG) 
(TSMC 28nm HPL process) 


To be presented by Melanie Berg at the NASA Electronic Parts and Packaging Program (NEPP) Electronics Technology Workshop (ETW), NASA Goddard 
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Xilinx Kintex-7 SEL Investigation 

• This is an independent study to determine the SEL 
susceptibility of the Commercial Xilinx Kintex-7device. 

• Prior SEL testing has been performed by other groups. They 
have reported observing SEL in the Xilinx 7-series devices. 

- Lee, D.S.; Wirthlin, M.; Swift, G.; Le, A.C., "Single-Event 
Characterization of the 28 nm Xilinx Kintex-7 Field-Programmable 
Gate Array under Heavy Ion Irradiation," Radiation Effects Data 
Workshop (REDW), 2014 IEEE , vol., no., pp.1,5, 14-18 July 2014 

• NEPP decided to perform follow-up tests for validation. 

- NEPP test procedure was slightly different: 

• Real-time configuration memory scrubbing during irradiation. 

• Analog circuitry monitoring. 

• Custom DUT board was designed to connect with the NEPP 
LCDT. 

- Note: SEL is determined by an increase of DUT current that can 
only be lowered by reducing the DUT power below threshold. 
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IccAux (amps) 


Sample Kintex 7 SEL Data 

Graph courtesy of David Vail (Harris) 
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Kintex7 IccAux During Beam Exposure 


— LET=41.2 f 

Ji \ #•*• O 





— LET=20.6 r 

viev cm'/mg 
\ZleV*cm 2 /mg 

V/|pV*nm 2 /mn 




1 

— LET=11.4 , 








NASA SEL data 





agree with other 

arauns’ test data 















l 



| 












14 



Cross-section (cm 2 ) 


Kintex 7 SEL Cross-Section 

Analysis Performed by David Vail (Harris) 



Xilinx Kintex 7 SEL Cross-Sections with Weibull Fit 
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SRAM-based FPGA Mitigation Study 

(Triple Modular Redundancy (TMR) and Scrubbing) 
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Mitigation Study Overview 



* This is an independent study to determine the 
effectiveness of various triple modular redundancy 
(TMR) schemes implemented in SRAM-based FPGA 
devices. 


* TMR schemes are defined by what portion of the 
circuit is triplicated and where the voters are placed. 

- The strongest TMR implementation will triplicate all data- 
paths and contain separate voters for each data-path. 

- However, this can be costly: area, power, and complexity. 

- A trade is performed to determine the TMR scheme that 
requires the least amount of effort and circuitry that will 

meet project requirements. 

• Presentation scope: 

- Block TMR (BTMR), Local TMR (LTMR), Distributed TMR 
(DTMR), Mixed TMR (PTMR). 
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TMR Descriptions 

OFF: Edge triggered flip-flop; CL: Combinatorial Logic 


TMR 

Nomenclature 

Description 

TMR 

Acronym 

Block TMR 

Entire design is triplicated. Voters are 
placed at the outputs. 

BTMR 

Local TMR 

Only the DFFs are triplicated. Voters 
are placed after the DFFs. 

LTMR 

Distributed TMR 

DFFs and CL-data-paths are 
triplicated. Similar to a design being 
triplicated but voters are placed after 
the DFFs. 

DTMR 

Global TMR 

DFFs, CL-data-paths and global 
routes are triplicated. Voters are 
placed after the DFFs. 

GTMR or 
XTMR 


Note: It is suggested to separate (partition) TMR domains in SRAM 
based designs so that there are no overlapped shared resources. 
Shared resources become single points of failure. 
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Results: Mitigation SEU Data 


Mean Fluence to Failure (MFTF) for Various Mitigation 

Strategies 
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Unexpected Note: MFTF for DTMR-No- 
Partition is near DTMR with Partition! 
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■ PTMR 
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Mixed TMR (PTMR) 
has poor results: used 
no feedback DTMR 
and LTMR in some 
areas. 
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MFTF 


Results: Availability in Non-Flushable Designs 



Availability: MFTF for BTMR versus Pure 

Counters 
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BTMR 

Pure Counters 
BTMR One Out of Three 


Reliability for 
BTMR-one-out-of- 
three can be less 
than counters with 
no mitigation! 


5.7 


20.6 41.2 

LET MeV*cm 2 /mg 


The Common Strategy Is To Reset The System Upon First Block 

(component) Error. 

This affects Availability 
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Configuration Memory Scrubbing- 

Results: SEFIs 



* Two methods of scrubbing were performed: 

- SelectMap (direct from LCDT), and 

- Internal configuration access port (ICAP) (Signals 
sourced from LCDT with feed-through to ICAP). 

* Both use blind scrubbing - hence can correct 
any number of configuration SEUs. 

* A decade of difference for SEFIs were observed 
when using ICAP. 
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SEU Cross-Section (cm 2 ) 


Configuration Scrubbing SEFI Cross-Sections: 

SelectMap versus ICAP 
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Plans for FY15 and out: 
Microsemi, Xilinx, Altera, and Synopsis. 
We Looking for Collaborators 
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Microsemi RTG4 


• New Entry into the Aerospace Market with Space-grade 
Expectation 

- 65nm 

• Custom daughter (DUT) cards are currently being built. 
Plan to be fabricated and populated by August. 

• Prototype evaluation board will be purchased for early 
design development. 

• Phase I tests (date: fall 2015): 

- Shift registers, counters, PLLs, and DSPs. 

- Use of Synopsis tool for mitigation insertion. 

• Phase II and Phase III tests (date TBD): 

- High speed serial interfaces (XAUI, PCIe, Spacewire, and 
Spacefibre), instantiated processor(TBD). 

- Use of Synopsis tool for mitigation insertion. 
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Altera Stratix V Radiation Test 

Development 

• New Entry into the Aerospace Market with COTS Expectation 

- 28 nm bulk CMOS 

* Custom daughter (DUT) cards are currently being built. Plan 
to be fabricated and populated by August. 

• Evaluation boards have been purchased for early design 
development and early latch-up testing. 

• Phase I tests (date June 2015): 

- Evaluation board latch-up investigation. 

* Phase II tests (TBD): 

- Shift registers, counters, Plls, and DSPs. 

- Use of Synopsis tool for mitigation insertion. 

* Phase III tests:(TBD): 

- High speed serial interfaces (TBD), instantiated processor(TBD). 

- Use of Synopsis tool for mitigation insertion. 
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Xilinx Kintex 7 UltraScale 


* New Entry into the Aerospace Market with COTS 
Expectation 

- 20 nm planar process (TSMC) 

* Prototype evaluation board will be purchased for early 
design development and early latch-up testing. 

* Parts not in hand but should arrive soon. 

* Phase I tests (date fall 2015 or spring 2016): 

- Evaluation board latch-up investigation. 

* Phase II tests (date TBD): 

- Shift registers, counters, PLLs, and DSPs. 

- Use of Synopsis tool for mitigation insertion. 

* Phase III tests (TBD): 

- High speed serial interfaces (TBD), embedded processors. 
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Xilinx Zynq UltraScale+ 



• New Entry into the Aerospace Market with COTS 
Expectation 

- 16 nm vertical process (TSMC) 

• Multi-Processor System on a Chip (MPSoC) family. 

• Prototype evaluation board will be purchased for early 
design development and early latch-up testing. 

• Planning to receive parts in spring of 2016. 

• Custom daughter (DUT) cards will be built (date TBD). 

• Phase I tests (date TBD): 

- Evaluation board latch-up investigation. 

• Phase II tests (date TBD): 

- Shift registers, counters, PLLs, and DSPs. 

- Use of Synopsis tool for mitigation insertion. 

• Phase III tests (TBD): 

- High speed serial interfaces (TBD), embedded processors. 
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BACKUP CHARTS 
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Block Triple Modular Redundancy: BTMR 


Copy 1 



Copy 2 
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Voting is only at 
outputs of 
complex blocks. 
Can Only Mask 
Errors 


3x the error rate with 
triplication and no 
correction/flushing. 


Need Feedback to DFFS in order to Correct. 


• Cannot apply internal correction from voted outputs. 

• If blocks are not regularly flushed (e.g. reset), Errors can 
accumulate - may not be an effective technique. 
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When BTMR Works: Examples of Flushable 

BTMR Designs 

• Shift Registers, 

• Finite impulse response (FIRs), 



• Transmission channels: It is typical for transmission 
channels to send and reset after every sent packet, 

* Lock-Step microprocessors that have relaxed 
requirements such that the microprocessors can be reset 
(or power-cycled) every so-often. 


Flushable transmission channel example: 
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If The System Is Not Flushable, Then 
BTMR May Not Provide The Expected 

Level of Mitigation 

* With a BTMR scheme, there is no correction, just 
masking. 

- Voters have no feedback. 



- Voters need to reach DFFs in order to perform correction. 

* BTMR can work well as a mitigation scheme if the 
expected MTTF » expected (or required) window of 
correct operation. 

* But... If the expected time to failure for one block is 
less than the required full-liveliness window, then 
BTMR doesn’t buy you anything. 

* If not thought out well, BTMR can actually be a 
detriment - complexity, power, and area, and false 
sense of performance. 
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Explanation of BTMR Strength and Weakness 
using Classical Reliability Models 



Relibility for 1 
block (R b | OCk ) 

Relibility for 
BTMR (R B tmr) 

Mean Time to 
Failure for 1 
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Operating in this time 
interval will provide a 
slight increase in 
reliability. 

However, it will provide a 
relatively hard design. 
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What Should be Done If Availability 
Needs to be Increased? 



* If the blocks within the BTMR have a relatively high upset 
rate with respect to the required operational window, then 
stronger mitigation must be implemented. 

* Bring the voting/correcting inside of the modules... bring 
the voting to the module DFFs. 


The following slides illustrate the various forms of TMR that 

include voter insertion in the data-path. 


TMR 

Description 

TMR 

Nomenclature 

OFF: Edge triggered flip-flop; CL: Combinatorial Logic 

Acronym 

Local TMR 

DFFs are triplicated 

LTMR 

Distributed TMR 

DFFs and CL-data-paths are 
triplicated 

DTMR 

Global TMR 

DFFs, CL-data-paths and global 

GTMR or 


routes are triplicated 

XTMR 


To be presented by Melanie Berg at the NASA Electronic Parts and Packaging Program (NEPP) Electronics Technology Workshop (ETW), NASA Goddard 

Space Flight Center in Green be It, MD, June 23-26, 2015. 


34 






Local Triple Modular Redundancy (LTMR) 

• Only DFFs are triplicated. Data-paths are kept singular. 

• LTMR masks upsets from DFFs and corrects DFF upsets if feedback is 
used. 

Good for devices where DFFs are most 
susceptible and configuration and CL 
susceptibility is insignificant; e.g., 

Microsemi ProASLC3. 
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LTMR Should Not Be Used in An 
SRAM Based FPGA 


Look Up Table: LUT 




Voter 


LUT 


Too many other configuration bits + logic that can be 
corrupted by an SEU. Mitigation needs to be stronger than 

DFFs. 
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Distributed Triple Modular Redundancy (DTMR) 

• Triple all data-paths and add voters after DFFs. 

• DTMR masks upsets from configuration + DFFs + CL and corrects 
captured upsets if feedback is used. 


Good for devices where configuration or DFFs + CL are more 
susceptible than project requirements; e.g., Xilinx and Altera 
commercial FPGAs. 
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Global Triple Modular Redundancy (GTMR) 

• Triple all clocks, data-paths and add voters after DFFs. 

• GTMR has the same level of protection as DTMR; however, it also 
protects clock domains. 



Good for devices where configuration or DFFs + CL are more 
susceptible than project requirements; e.g., Xilinx and Altera 
commercial FPGAs. 



p(f,> 


s' err o 


Low Loml L 

f'- ^i^flfiguration ' *\J .alLogic 

~^p(i 





ow 


To be presented by Melanie Berg at the NASA Electronic Parts and Packaging Program (NEPP) Electronics Technology Workshop (ETW), NASA Goddard 

Space Flight Center in Green be It, MD, June 23-26, 2015. 


38 



Theoretically, GTMR Is The Strongest 
Mitigation Strategy... BUT... 



* Triplicating a design and its global routes takes up a 
lot of power and area. 

* Generally performed after synthesis by a tool- not 
part of RTL. 

* Skew between clock domains must be minimized such 
that it is less than the feedback of a voter to its 
associated DFF: 


- Does the FPGA contain enough low skew clock 
trees? (each clock + its synchronized reset)x3. 

- Limit skew of clocks coming into the FPGA. 

- Limit skew of clocks from their input pin to their 
clock tree. 


• Difficult to verify. 
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When Using TMR in an SRAM Based 
FPGA, Partitions Should Be Used 



SRAM based FPGAs use a significant 
number of shared resources; e.g., 
routing matrices. 

A resource that is shared across 
separate TMR domains can break the 
TMR scheme if hit by an SEU. 

Solution is to partition the TMR 
domains such that they do not share 
resources. 

Difficult: 

- Significantly increases area requirements, 

- Significantly reduces performance, and 

- It’s getting worse with new generations of 
devices. 



Name TMR domains 
with unique identifier 
for easier floor- 
planning. 
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